Goto

Collaborating Authors

 Storrs



FedGMark: Certifiably Robust Watermarking for Federated Graph Learning Yuxin Yang 1,2 Qiang Li1 Yuan Hong 3 Binghui Wang

Neural Information Processing Systems

Federated graph learning (FedGL) is an emerging learning paradigm to collaboratively train graph data from various clients. However, during the development and deployment of FedGL models, they are susceptible to illegal copying and model theft. Backdoor-based watermarking is a well-known method for mitigating these attacks, as it offers ownership verification to the model owner. We take the first step to protect the ownership of FedGL models via backdoor-based watermarking. Existing techniques have challenges in achieving the goal: 1) they either cannot be directly applied or yield unsatisfactory performance; 2) they are vulnerable to watermark removal attacks; and 3) they lack of formal guarantees. To address all the challenges, we propose FedGMark, the first certified robust backdoor-based watermarking for FedGL. FedGMark leverages the unique graph structure and client information in FedGL to learn customized and diverse watermarks. It also designs a novel GL architecture that facilitates defending against both the empirical and theoretically worst-case watermark removal attacks.


HyperGCL: Multi-Modal Graph Contrastive Learning via Learnable Hypergraph Views

arXiv.org Artificial Intelligence

--Recent advancements in Graph Contrastive Learning (GCL) have demonstrated remarkable effectiveness in improving graph representations. However, relying on predefined augmentations (e.g., node dropping, edge perturbation, attribute masking) may result in the loss of task-relevant information and a lack of adaptability to diverse input data. Furthermore, the selection of negative samples remains rarely explored. In this paper, we introduce HyperGCL, a novel multimodal GCL framework from a hypergraph perspective. HyperGCL constructs three distinct hypergraph views by jointly utilizing the input graph's structure and attributes, enabling a comprehensive integration of multiple modalities in contrastive learning. A learnable adaptive topology augmentation technique enhances these views by preserving important relations and filtering out noise. View-specific encoders capture essential characteristics from each view, while a network-aware contrastive loss leverages the underlying topology to define positive and negative samples effectively. Extensive experiments on benchmark datasets demonstrate that HyperGCL achieves state-of-the-art node classification performance. Building on the success of contrastive learning (CL) in computer vision and natural language processing [1], [2], CL approaches have been extended to graph data--known as Graph Contrastive Learning (GCL)--where Graph Neural Networks (GNNs) learn robust representations by maximizing agreement between augmented graph views [3]-[6]. First, they often depend on handcrafted augmentations such as node dropping, edge perturbation, and attribute masking.


Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps

arXiv.org Artificial Intelligence

This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data, including GPT2, SmolLM2, OpenELM, TinyLlama, Stable LM, and Gemma 2. We identify a critical parameter threshold (~1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning. Specifically, models above this threshold achieve better success rates in chain-of-thought (CoT) prompting for deductive reasoning tasks, especially those requiring longer reasoning chains, such as proof by contradiction and disjunction elimination. To address limitations in sub-threshold models, we demonstrate that fine-tuning with task-specific exemplars substantially enhances reasoning performance, enabling accurate CoT generation even without additional exemplars in the prompt for tasks with shorter reasoning chains. Finally, our analysis of attention maps reveals that models capable of generating correct CoTs exhibit higher token-level attention scores on subsequent correct tokens and the correct parts of speech, providing interpretability insights into reasoning processes. These findings collectively advance understanding of reasoning capabilities in decoder-only transformer-based models. The code is available at: https://github.com/AnnonymousForPapers/CoT_Reasoning_Test.


Learning Euler Factors of Elliptic Curves

arXiv.org Artificial Intelligence

We apply transformer models and feedforward neural networks to predict Frobenius traces $a_p$ from elliptic curves given other traces $a_q$. We train further models to predict $a_p \bmod 2$ from $a_q \bmod 2$, and cross-analysis such as $a_p \bmod 2$ from $a_q$. Our experiments reveal that these models achieve high accuracy, even in the absence of explicit number-theoretic tools like functional equations of $L$-functions. We also present partial interpretability findings.


Mathematical Data Science

arXiv.org Artificial Intelligence

In this article we discuss an approach to doing this which one can call mathematical data science. In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations. Broadly speaking, the field of data science is concerned with assembling, curating and analyzing large datasets, and developing methods which enable its users to not just answer predetermined questions about the data but to explore it, make simple descriptions and pictures, and arrive at novel insights. This certainly sounds promising as a tool for mathematical discovery! Mathematical data science is not new and has historically led to very important results. A famous example is the work of Birch and Swinnerton-Dyer leading to their conjecture [BSD65], based on computer generation of elliptic curves and linear regression analysis of the resulting data. However, the field really started to take off with the deep learning revolution and with the easy access to ML models provided by platforms such as Py-Torch and TensorFlow, and built into computer algebra systems such as Mathematica, Magma and SageMath.


Uncertainty Quantification of Wind Gust Predictions in the Northeast US: An Evidential Neural Network and Explainable Artificial Intelligence Approach

arXiv.org Machine Learning

Machine learning has shown promise in reducing bias in numerical weather model predictions of wind gusts. Yet, they underperform to predict high gusts even with additional observations due to the right-skewed distribution of gusts. Uncertainty quantification (UQ) addresses this by identifying when predictions are reliable or needs cautious interpretation. Using data from 61 extratropical storms in the Northeastern USA, we introduce evidential neural network (ENN) as a novel approach for UQ in gust predictions, leveraging atmospheric variables from the Weather Research and Forecasting (WRF) model as features and gust observations as targets. Explainable artificial intelligence (XAI) techniques demonstrated that key predictive features also contributed to higher uncertainty. Estimated uncertainty correlated with storm intensity and spatial gust gradients. ENN allowed constructing gust prediction intervals without requiring an ensemble. From an operational perspective, providing gust forecasts with quantified uncertainty enhances stakeholders' confidence in risk assessment and response planning for extreme gust events.


Machine Learning Mutation-Acyclicity of Quivers

arXiv.org Artificial Intelligence

Machine learning (ML) has emerged as a powerful tool in mathematical research in recent years. This paper applies ML techniques to the study of quivers--a type of directed multigraph with significant relevance in algebra, combinatorics, computer science, and mathematical physics. Specifically, we focus on the challenging problem of determining the mutation-acyclicity of a quiver on 4 vertices, a property that is pivotal since mutation-acyclicity is often a necessary condition for theorems involving path algebras and cluster algebras. Although this classification is known for quivers with at most 3 vertices, little is known about quivers on more than 3 vertices. We give a computer-assisted proof of a theorem to prove that mutation-acyclicity is decidable for quivers on 4 vertices with edge weight at most 2. By leveraging neural networks (NNs) and support vector machines (SVMs), we then accurately classify more general 4-vertex quivers as mutation-acyclic or non-mutation-acyclic. Our results demonstrate that ML models can efficiently detect mutation-acyclicity, providing a promising computational approach to this combinatorial problem, from which the trained SVM equation provides a starting point to guide future theoretical development.


Efficient transformer with reinforced position embedding for language models

arXiv.org Artificial Intelligence

In this paper, we propose an efficient transformer architecture that uses reinforced positional embedding to obtain superior performance with half the number of encoder decoder layers. We demonstrate that concatenating positional encoding with trainable token embeddings, normalizing columns in the token embedding matrix, and using the normalized token embedding matrix as the value of the attention layer improve the training and validation loss and the training time in an encoder-decoder Transformer model for a Portuguese-English translation task with 10 epochs or 12 hours of training across 10 trials. Our method, with roughly a threefold parameter reduction compared to the baseline model, yields a mean training loss of 1.21, a mean validation loss of 1.51, and an average training time of 1352.27 Additionally, we evaluated our proposed architecture and the baseline across 14 diverse translation datasets from TensorFlow. The results indicate that our method consistently achieves lower or comparable training and validation losses, suggesting enhanced learning efficiency.


Explainable Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring

arXiv.org Artificial Intelligence

In-situ sensing, in conjunction with learning models, presents a unique opportunity to address persistent defect issues in Additive Manufacturing (AM) processes. However, this integration introduces significant data privacy concerns, such as data leakage, sensor data compromise, and model inversion attacks, revealing critical details about part design, material composition, and machine parameters. Differential Privacy (DP) models, which inject noise into data under mathematical guarantees, offer a nuanced balance between data utility and privacy by obscuring traces of sensing data. However, the introduction of noise into learning models, often functioning as black boxes, complicates the prediction of how specific noise levels impact model accuracy. This study introduces the Differential Privacy-HyperDimensional computing (DP-HD) framework, leveraging the explainability of the vector symbolic paradigm to predict the noise impact on the accuracy of in-situ monitoring, safeguarding sensitive data while maintaining operational efficiency. Experimental results on real-world high-speed melt pool data of AM for detecting overhang anomalies demonstrate that DP-HD achieves superior operational efficiency, prediction accuracy, and robust privacy protection, outperforming state-of-the-art Machine Learning (ML) models. For example, when implementing the same level of privacy protection (with a privacy budget set at 1), our model achieved an accuracy of 94.43%, surpassing the performance of traditional models such as ResNet50 (52.30%), GoogLeNet (23.85%), AlexNet (55.78%), DenseNet201 (69.13%), and EfficientNet B2 (40.81%). Notably, DP-HD maintains high performance under substantial noise additions designed to enhance privacy, unlike current models that suffer significant accuracy declines under high privacy constraints.